Distributed XML processing: Theory and applications

نویسندگان

  • Dirceu Cavendish
  • K. Selçuk Candan
چکیده

Basic message processing tasks, such as well-formedness checking and grammar validation, common in Web servicemessaging, can be off-loaded from the service providers’ own infrastructures. The traditional ways to alleviate the overhead caused by these tasks is to use firewalls and gateways. However, these single processing point solutions do not scale well. To enable effective off-loading of common processing tasks, we introduce the Prefix Automata SyStem — PASS, a middleware architecture which distributively processes XML payloads of web service SOAP messages during their routing towards Web servers. PASS is based on a network of automata, where PASS-nodes independently but cooperatively process parts of the SOAP message XML payload. PASS allows autonomous and pipelined in-network processing of XML documents, where parts of a largemessage payload are processed by various PASS-nodes in tandem or simultaneously. The non-blocking, non-wasteful, and autonomous operation of PASS middleware is achieved by relying on the prefix nature of basic XML processing tasks, such as well-formedness checking and DTD validation. These properties ensure minimal distributed processing management overhead. We present necessary and sufficient conditions for outsourcing XML document processing tasks to PASS, as well as provide guidelines for rendering suitable applications to be PASS processable. We demonstrate the advantages of migrating XML document processing, such as well-formedness checking, DTD parsing, and filtering to the network via event driven simulations. © 2008 Elsevier Ltd. All rights reserved. 1. Motivation and related work As web service standardization efforts get mature and many institutions embrace these standards and available services as means to reduce their development and maintenance costs, web services are becoming ubiquitous and irreplaceable components of (e-)businesses. Consequently, end-to-end delivery, processing, and routing of web service requests (and the underlying SOAP messages [40]) have significant impact on the response times observed by end-users of these services. Most commercial web sites pay premium prices for solutions that help them reduce their response times as well as risks of failure when faced with high access rates. Delivering a web service requires careful design in many aspects: security, robustness, and sustained performance are three of these issues. A web service request and its reply consumes resources both at public networks (SOAP header processing and routing) as well as at the service provider’s own network and servers. A typical Web service provider might have to allocate significant resources (in terms of ∗ Corresponding author. E-mail addresses: [email protected] (D. Cavendish), [email protected] (K.S. Candan). 0743-7315/$ – see front matter© 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.jpdc.2008.04.003 network, memory, CPU, as well as I/O) for verifying and processing each incoming service request. Consequently, a high request load may translate into poor service delivery performance for the Web service provider. In fact, denial of service attack, where a Web service provider fails to deliver services due to malicious, invalid, and unauthorized accesses to the system, is an extreme case of service degradation due to resource limitations. Generally, the above challenges are addressed in two orthogonal, but complementary mechanisms: back-end solutions and front-end solutions. Many services benefit from both back-end solutions (such as load balanced server farms) aswell as front-end solutions (including, edge-caches, proxy-servers, in-network query filters and query routers, and distributed view managers). Most high volume sites typically deploy a large number of servers and employ hardwareor software-based load balancing components to reduce the response time of their back-end servers. Although they guarantee some protection against surges in demand, such localized solutions cannot help reduce the delay introduced in the network during the transmission of the content to end-users. In order to alleviate this problem, content providers also replicate or mirror their content at edge caches; i.e. caches that are close to end users. If data or process can be placed in a proxy server or cache closer to end-users, when a user requests the service, it can be delivered promptly from the cache without additional communication with the Web server, reducing the response time. This D. Cavendish, K.S. Candan / J. Parallel Distrib. Comput. 68 (2008) 1054–1069 1055 Fig. 1. The common processing tasks that can be offloaded include message validation (e.g., well-formedness checking, DTD validation) and filtering/contentbased authorization. approach also reduces the load on the original source as some of the requests can be processed without accessing the source. Thus, to protect themselves against performance degradations when facing high request load,Web service providers use solutions such as replication [2,1,19,35], caching [35,9,11,43,6,41], and even off-loading to contentand application-delivery networks [2,26, 31]. Unfortunately, many services (especially transaction-oriented and data-driven ones) can not easily benefit from caching and off-loading as business data and transactions might require local processing. Therefore, in order to prevent service provider’s throughput to be negatively affected by high request volumes, any wastage of Web service provider resources should be avoided. Despite the challenges associated with off-loading entire web services, we note that there are parts of the service request processing that are common across many web service delivery and service composition deployments. As shown in Fig. 1, most service requests (even those that are transaction oriented and/or database-driven) consume service providers’ resources for highly common and mundane tasks, such as XML message validation (well-formedness checking, DTD grammar validation), as well as various web service value added features (e.g., content-based message filtering). Since there is little sharing and caching opportunities applicable, such low level document processing tasks are quickly becoming a bottleneck in service delivery infrastructures. Indeed XML parsing is now recognized to be a bottleneck in XML processing. Valuable network and CPU resources are wasted and service delays are incurred because processing of these tasks has to be done by the service provider or (at best) by XML appliances (including XML firewalls, such as WebSphere DataPower Security Gateway XS40 [15] and Sarvega Security Gateway [37], XML accelerators, such as WebSphere DataPower XA35 XMLAccelerator [16]) deployed at the edge of the service providers infrastructure. Naturally, such solutions which delay XML processing to the end of the are subject to bottlenecks, single points of failure, and denial of service attacks. More importantly, they still waste valuable network resources. In fact, there is an increasing number of XML message process off-loading technologies that provide support for processing individual requests. However, these technologies are either high-level (usually proxy-based) publish/subscribe solutions (such as [42], JMS [24], XPASS [34], NiagaraCQ [12], SemCast [33], CoDD [3]), or are purely networklevel intelligent message routing solutions (such as WebSphere DataPower Integration Appliance XI50 [18] and Sarvega XML Context Router [36]), which do not go beyond interpreting the request and reply message headers, and do not support XML document level processing. With recent advances in network processors, on the other hand, hardware assisted in-network XML processing have started to be investigated. [29] has proposed to use Network Processors to execute XML processing. NPs can be placed at various places in the network, from edge routers, for caching and firewall applications, to routers, to ‘‘move application services into the network’’. Once application services are identified, they are mapped into the NP pipeline structure (microengines), for efficient processing. In a related work, [14] investigates XML DOM parsers that can be implemented in Networked Embedded Systems with real time requirements. Their approach involves pre-allocation of memory objects, so that dynamic memory allocation is avoided, with its unpredictability in performance. The work underlines the importance of an efficient parsing and document validation tasks within XML processing. Since XML data is expected to become a significant part of network load [32], the development of scalable in-network solutions that can save both network and server resources, aswell as eliminate spurious/faultymessageswithin the network becomes essential.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bioinformatic Application Integration with Distributed Server Approach

Many bioinformatic applications are computationally intensive. While many suggested using multiprocessor machines or distributed processing to handle computationally intensive programs, porting existing applications to these architectures and managing these applications are in general difficult. This paper proposes a distributed server approach which adapts the concept of peer-to-peer communica...

متن کامل

Protocols and Architectures on the Platform for the Distributed Applications

The paper outlines the distributed application platform, which exists in the frame of distributed processing environments and implements different kinds of technologies for realisation these applications. On the intermediary platform, between network and application layers, it needs to select right technology. Web-based approach with XML protocol and middleware approach with CORBA architecture ...

متن کامل

Metadata Services for Distributed Event Stream Processing Agents

Enterprise-level applications are becoming complex with the need for event and stream processing, multiple query processing and data analysis over heterogeneous data sources such as relational databases and XML data. Such applications require access to the metadata information for these different data sources. This paper discusses the design and implementation of a servicebased dynamic metadata...

متن کامل

A Framework for Distributed XML Data Management

As data management applications grow more complex, they may need efficient distributed query processing, but also subscription management, data archival etc. To enact such applications, the current solution consists of stacking several systems together. The juxtaposition of different computing models prevents reasoning on the application as a whole, and wastes important opportunities to improve...

متن کامل

Declarative Development of Distributed Applications

Apart from traditional usage scenarios such as online shopping and browsing, the web continues to evolve to an active platform for distributed applications, e.g. implementing business processes. Standardized protocols and technologies, including Web Services, RSS/Atom feeds and REST, provide the communication infrastructure for the involved systems. They allow the integration of heterogeneous c...

متن کامل

Active XML (AXML) research: Survey on the representation, system architecture, data exchange mechanism and query evaluation

Active XML (AXML) is an extension of XML to exploit the powerful computation ability of peer-to-peer network and Web services technologies. AXML is considered a distributed XML DBMS which extends the capability of XML by embedding intensional XML data inside XML documents. The management of intensional XML and XML data together in XML documents raises issues such as representation for intension...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 68  شماره 

صفحات  -

تاریخ انتشار 2008